Domain-Independent Optimistic Initialization for Reinforcement Learning
نویسندگان
چکیده
In Reinforcement Learning (RL), it is common to use optimistic initialization of value functions to encourage exploration. However, such an approach generally depends on the domain, viz., the scale of the rewards must be known, and the feature representation must have a constant norm. We present a simple approach that performs optimistic initialization with less dependence on the domain.
منابع مشابه
Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version
In this paper we propose an algorithm for polynomial-time reinforcement learning in factored Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm, maintains an empirical model of the FMDP in a conventional way, and always follows a greedy policy with respect to its model. The only trick of the algorithm is that the model is initialized optimistically. We pro...
متن کاملExploiting locality of interactions using a policy-gradient approach in multiagent learning
In this paper, we propose a policy gradient reinforcement learning algorithm to address transition-independent Dec-POMDPs. This approach aims at implicitly exploiting the locality of interaction observed in many practical problems. Our algorithms can be described by an actor-critic architecture: the actor component combines natural gradient updates with a varying learning rate; the critic uses ...
متن کاملPracticing Q-Learning
Q-Learning has gained increasing attention as a promising real time learning scheme from delayed reinforcement. Being compact, model free and theoretically optimal it is commonly preferred to AHC-Learning and its derivatives. However, it has long been noticed that theoretical optimality has to be sacrificed in order to meet the constraints of most applications. In this article we report of expe...
متن کاملCoordination of independent learners in cooperative Markov games
In the framework of fully cooperative multi-agent systems, independent agents learning by reinforcement must overcome several difficulties as the coordination or the impact of exploration. The study of these issues allows first to synthesize the characteristics of existing reinforcement learning decentralized methods for independent learners in cooperative Markov games. Then, given the difficul...
متن کاملComplexity Analysis of
This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal state in deterministic domains. Previous work had concluded that, in many cases, tabula rasa reinforcement learning was exponential for such problems, or was tractable only if the learning algorithm wa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1410.4604 شماره
صفحات -
تاریخ انتشار 2015